

## WHAT IS CLAIMED IS:

1. A computer implemented method for scheduling processor jobs on a network of parallel machine processors or distributed system processors, comprising the steps of:

5 accumulating in buffers control information communications generated by each process performed by each processor during a defined time interval, where adjacent time intervals are separated by intervening strobe intervals for a global exchange of control information; and

10 performing a global exchange of the control information communications at the end of each defined time interval during the intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval.

2. The computer implemented method according to Claim 1, further including the steps of:

5 running on each processor an ongoing process in the presence of a non-blocking communication call and storing the information relative to the communication in a descriptor;

10 yielding the processor to an operating system in the presence of a blocking communication call and storing the information in a descriptor while suspending the ongoing process and activating a ready process from a ready queue, if any; and

putting the ongoing process on the ready queue when the blocking communication call is completed.

3. The computer implemented method according to Claim 1, further including the steps of:

issuing a download command to each processor at the beginning of a strobe interval;

15 downloading from each processor at the command of each processor kernel at the beginning of a strobe interval descriptor control packets into the network for a total exchange between all processors so that each processor is informed of the number of incoming communications to be received in the succeeding time interval; and

20 scheduling by each processor kernel of communications accumulated prior to the strobe interval to be delivered in the succeeding time interval.

4. The computer implemented method according to Claim 2, wherein each descriptor includes an identification of the type of communication, the sending and receiving processors, and the virtual addresses of the buffers.

5. The computer implemented method according to Claim 3, wherein each descriptor includes an identification of the type of communication, the sending and receiving processors, and the virtual addresses of the buffers.

6. A computer implemented method for enhancing fault tolerance in a network of parallel machine processors or distributed system processors, comprising the steps of:

5 checkpointing each processor at the end of a defined time interval to store a checkpointed status;

identifying faults in a strobe time interval following the defined time interval;

reconfiguring the system to remove any identified faults;

updating the checkpointed status with reconfigured system information; and restarting processing with the updated status.

FILED IN THE U.S. PATENT AND TRADEMARK OFFICE

7. The computer implemented method according to Claim 6, where the step of checkpointing each processor further comprises:

compiling the status of a process performed by each processor during the defined time interval; and

5 forming and storing a memory image of processes and descriptors of all processors at the end of the defined time interval.